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a  Estimation  of  parametric  families  for  small  data  sets  where  a  significant 
'  portion  of  the  data  lay  below  fixed  Instrument  detection  thresholds  was 
investigated.  Thus  the  number  of  data  points  was  random  (an  example  of 
Type  I  censoring).  Both  analytic  and  simulation  procedures  were  utilized. 

In  particular,  maximum  liklihood  techniques,  order  statistics-techniques, 
truncation  techniques,  fill-in  with  constants,  and  fill-in  with  expected  valueb 
of  the  missing  points  were  investigated.  For  exponential  data,  truncation  seemei 
nmflF  annrnnrlafa  whllp  for  normal  and  Ing-normal  data,  fill-in  with  expected  4 
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values  (modified  to  correct  for  conditioning  on  the  number  of 
data  points)  was  best.  The  criteria  for  selection  was  the  total 
square  error. 
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Among  the  problems  encountered  In  attempting  to  analyze  data  from 
actual  experiments  are  (1)  a  significant  portion  of  the  data  points  often 
fall  below  the  instrument  detection  thresholds  and  (2)  insufficient  data 
are  available  to  form  the  population  size  necessary  to  validate 
conclusions  reached  by  standard  statistical  techniques,  versar  addressed 
these  deficiencies  via  this  research  effort  to  provide  the  Air  Porce  with 
better  techniques  to  evaluate  experiments  yielding  data  thus  characterized. 

When  measuring  environmental  phenomena,  the  measuring  devices/ 
procedures  used  are  often  unable  to  detect  low  concentrations.  Thus, 
concentrations  below  certain  threshold  levels  are  not  measurable. 

Standard  "detection  limits"  are  set  by  various  agencies  for  various 
phenomena  for  various  types  of  measuring  devices.  Measured  values  below 
these  limits  are  reported  as  "below  detection  limit"  and  are  thus  not 
available  for  statistical  analysis.  (Sometimes  values  below  these  limits 
are  available,  but  their  accuracy  is  greatly  in  doubt.)  Consequently,  the 
statistician  often  has  a  very  basic  problem  facing  him:  how  does  he 
analyze  data  sets  which  contain  a  reasonable  percentage  of  "below 
detection  limit"  entries?  This  problem  is  exacerbated  by  the  usual 
problem  of  small  sample  size.  As  an  example,  support  we  have  taken  eight 
samples  of  air  near  a  chemical  warehouse  in  order  to  see  if  there  are 
leaks.  Concentrations  below  0.7  parts  per  billion,  say,  are  below  the 
reliability  of  the  measurement  procedure.  Of  the  eight  samples,  suppose 
five  are  below  the  detection  limit  While  the  other  three  are  measured  to 
have  concentrations  of  1.  2,  and  5  parts  per  billion.  How  do  we  find  the 
average  concentration? 


The  dual  problems  of  small  sample  size  and  sub-detection  limit  data 
can  often  be  encountered  by  statisticians  working  on  Air  Force  problems. 
Examples  are: 


The  determination  of  the  "hardening"  characteristics  of  avacs  and 
other  Air  Force  systems  against  nuclear  explosions.  The  tests  to 
simulate  segments  of  a  nuclear  environment  are  expensive  and 
provide  relatively  few  data  points  in  small  portions  of  the 
radiation  spectrum.  A  significant  portion  of  these  data  could  be 
"real,"  but  could  be  below  the  detection  limits  of  the 
instrumentation  used.  *»*•'-*!£  -.'ffick  ■>?  scientific  reskahch  <  afs 
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o  The  determination  of  the  effects  of  Chemical,  Biological,  and 
Radiation  (CBR)  warfare  agents  against  Air  Force  systems  and 
personnel.  Again,  due  to  the  nature  and  costs  of  tests,  data 
would  be  relatively  sparse.  Much  of  these  data,  particularly 
those  representing  “leakage"  and  other  unlntentloned  side-effects, 
could  be  below  detection  instrument  thresholds,  but  would  be 
useful  In  evaluating  the  agent  effects,  particularly  in  the  regime 
of  low  dosages  over  a  long  exposure  period.  The  CBR  requirement 
will  undoubtedly  receive  more  emphasis  after  the  President's 
recent  announcement  to  resume  the  development  of  these  agents. 

o  The  exposure  of  Air  Force  personnel  to  fumes  released  by  fueling 
operations.  Some  fuels  contain  trace  amounts  of  toxic  substances 
(indications  exist  that  synthetic  fuels,  which  may  receive 
increasing  Air  Force  use,  contain  larger  concentrations  of  such 
substances  than  do  conventional  fuels).  The  exposure 
concentrations,  and  their  effects  over  time,  to  persons 
continually  involved  in  refueling  operations  need  to  be  further 
assessed.  Many  of  these  concentrations  are  frequently  below 
detection  equipment  threshold. 

Ve  studied  the  problem  of  "below  detection  limit"  data  coupled  with 
small  size  both  theoretically  and  via  computer  simulations.  Ve  suppose 
that  we  are  given  N  data  points,  p  of  which  are  "below  detection  limit"  L 
and  N-p  of  which  have  reported  values  larger  than  L.  We  suppose  that  the 
distribution  for  the  underlying  stochastic  process  is  known  to  belong  to  a 
fixed  family  of  distributions  depending  on  an  unknown  parameter  6.  ve 
wish  to  estimate  6.  Among  the  techniques  we  used  were  maximum 
likelihood  techniques,  order  statistic  techniques,  truncation  techniques, 
and  fill-in  with  constants  or  expected  values  procedures. 


Significant  Findings  of  Research  Effort 


The  findings  of  our  research  effort  are  embodied  in  three  manuscripts 

1.  Estimation  of  the  mean  for  small  data  sets  of  left-censored 
exponential  data  (Appendix  A) . 

2.  Estimation  of  the  normal  population  parameters  by  order  statistics 
given  a  singly  censored  Type  I  sample  (Appendix  B) . 

3.  Estimation  of  the  parameters  for  small  data  sets  of  left  censored 
normal  and  lognormal  data  (Appendix  C) . 

In  the  first,  we  investigated  exponential  data  characterized  by  the 
dual  problems  of  small  sample  size  with  several  values  reported  as 
"smaller  than  the  limit  L.”  We  proposed  several  estimators  for  the  mean. 
In  particular,  we  investigated: 

1.  maximum  likelihood  estimator  (MLB) 

2.  modified  MLE,  which  removes  the  conditioning  of  the  MLE  due  to 
knowledge  of  p,  the  number  of  censored  points 

3.  best  linear  invariant  estimator 

4.  best  linear  unbiased  estimator 

5.  fill-in  with  constants 

6.  modified  fill-in  with  constants,  which  removes  the  conditioning 
on  p 

7.  fill-in  with  expected  values,  which  is  equivalent  to  the  MLB 

8.  truncation. 

To  evaluate  the  performance  of  these  procedures  we  performed  a  simulation. 
Selecting  (by  scaling)  L*l,  we  tested  6*1/3,  2/3,  1,  2.  3,  and  5  for 
sample  sizes  N~5,  10,  and  15.  using  20,000  data  sets  for  each  of  the  18 
cases,  we  found  that  the  truncation  method  was  the  best  while  the  modified 
MLB  was  a  close  second.  Here  we  used  the  square  error  as  our  criterion 
for  selecting  techniques. 

The  second  manuscript  deals  with  estimating  the  parameters  of  the 
censored  normal  distribution  via  order  statistic  techniques.  Data  from  a 
censored  normal  has  been  analyzed  many  times  before  using  order 


statistics.  However,  ail  these  previous  studies  used  Type  II  censoring: 
the  p  smallest  observations  are  missing  where  p  is  fixed  a  priori.  Type  I 
censored  data  (observations  below  a  fixed  value  are  missing)  are  usually 
analyzed  by  Type  II  methods.  Ve  provide  Type  I  estimators:  however,  the 
algorithms  fail  to  converge  often  enough  to  make  the  method  practical. 

The  third  manuscript  deals  with  estimating  the  parameters  of  the 
censored  normal  distribution  by  other  than  order  statistic  techniques.  In 
particular  we  Investigated: 

1.  maximum  likelihood  estimator  (MLE) 

2.  modified  MLB,  which  removes  the  conditioning  on  p 

3.  fill-in  with  constants 

4.  modified  fill-in  with  constants,  which  removes  the  conditioning 
on  p 

5.  fill-in  with  expected  values  of  the  missing  data 

6.  modified  fill-in  with  expected  values,  which  removes  the 
conditioning  on  p 

1.  truncation. 

To  evaluate  the  performance  of  these  procedures,  we  performed  a  simulation. 
Selecting  (by  scaling)  L*l,  we  tested 
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for  sample  sizes  N*5,  10,  and  15.  Using  50,000  data  sets  for  each  of  the 
21  cases,  we  found  that  the  modified  fill-in  with  expected  values  was  the 
best  while  the  fill-in  with  expected  values  was  only  marginally  worse. 

The  former  had  smaller  bias  but  larger  variance  leading  to  a  slight 
improvement  (in  general)  of  the  total  squared  error. 
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Written  Manuscripts 

The  three  manuscripts  mentioned  above  will  be  submitted  to 
appropriate  journals  Cor  review  and  possible  publications. 

Presentation  of  Results 


We  presented  the  results  of  this  research  at  the  following  meetings 

1.  Operations  Research  Society  of  America,  spring  1983  national 
meeting  in  Chicago,  “Small  sample,  below  detection  limit 
exponential  data" 

2.  Operations  Research  Society  of  America,  fall  1983  national 
meeting  in  Orlando.  “Normal  data  and  detection  limits." 

We  provided  copies  of  our  manuscripts  to  several  individuals  including: 

1.  Dr.  John  Beauchamp ,  Oak  Ridge  National  Laboratory,  Tennessee 

2.  Dr.  David  Payton,  Air  Porce  Weapons  Laboratory,  Kir t land  APB, 
New  Mexico. 


We  discussed  the  material  of  this  study  Informally  with  participants  in 
the  workshop  on  reliability  held  at  the  University  of  North  Carolina  - 
Charlotte  in  June  1983. 
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ABSTRACT 


We  study  exponential  data  characterized  by  the  dual  problems  of 
small  sample  size  with  several  values  reported  as  "smaller  than  the  limit 
L."  In  particular,  we  propose  several  estimators  and  report  the  results 
of  a  simulation. 


Key  words:  Below  detection  limit;  Reliability 


INTRODUCTION 


In  reliability  theory,  the  exponential  distribution  plays  an 
important  role.  It  usually  leads  to  simple  formulas  for  the  quantities 
of  interest.  In  this  way  it  may  provide  a  "first  approximation"  to  the 
real-life  situation.  Indeed,  quite  often  it  leads  to  useful  bounds  for 
these  quantities.  So  an  investigation  of  a  problem  in  reliability  theory 
may  begin  with  a  discussion  of  the  exponential  distribution. 

We  have  in  mind  the  problem  of  estimating  mean  shelf-life  for  objects 
placed  in  inventory.  Often,  to  decide  whether  the  object  is  still  operable 
(or  edible  or  .  .  . ) ,  an  expensive  or  destructive  or  time-consuming  test 
need  be  performed.  Hence,  the  number  of  objects  for  our  experiment  is 
small  and  so  asymptotic  or  large-sample-size  results  are  inapplicable. 
Further,  since  testing  requires  money  and  time,  one  usually  does  not 
continuously  monitor  for  failures  from  the  moment  that  the  objects  are 
placed  in  inventory.  Consequently,  some  of  the  units  might  have  failed 
prior  to  our  testing.  Thus  our  data  will  be  characterized  by  (1)  small 
samples  and  (2)  reported  values  for  failures  if  above  some  limit  L,  but 
only  "below  L"  for  those  that  failed  very  quickly.  Our  problem  is  to 
estimate  the  mean  of  such  a  data  set  if  we  assume  the  underlying 
distribution  is  exponential. 

The  problem  that  we  address  below  is  an  example  of  censoring.  In 
general,  censoring  means  that  observations  at  one  or  both  extremes  are  not 
available.  Our  problem  is  equivalent  to  "left  censoring";  life  testing 
usually  involves  "right  censoring”,  i.e.,  the  largest  values  are  not 
available.  Two  types  of  life  censoring  have  received  much  attention.  Type 
I  occurs  when  the  test  is  terminated  at  a  specified  time  before  all  the 
items  have  failed;  Type  II  occurs  when  the  test  is  terminated  at  a 
particular  failure.  In  Type  I  censoring,  the  number  of  failures  as  well  as 
the  failure  times  are  random  variables.  This  of  course  makes  Type  I 


censoring  far  more  complicated.  Consequently  Type  II  methods  have  often 
been  applied  to  Type  I  data  with  the  hope  that  the  bias  is  not 
appreciable.  Our  problem  is  analogous  to  Type  II  censoring  since  the 
number  of  units,  say  p,  with  failures  "below  L"  is  a  random  variable. 


SECTION  1.  THE  ESTIMATORS 

Suppose  we  have  N  identical  units  on  test  with  time-to-failure 
exponentially  distributed  with  parameter  6,  i.e.,  time-to-failure  has 
probability  density 

.  .  .  1  -x/0 

fe<x)  ■  -  e  x>o.  (1) 

We  assume  o<p<N  values  lie  below  the  (known)  limit  L.  Thus  we  are 
given  data 


fx  ,  p  values  below  L}  (2) 

where  we  have  taken 

ft  -  N-p. 

We  are  asked  to  find  the  parameter  9.  In  this  section  we  shall 
investigate  several  techniques  to  estimate  9. 

I.  Maximum  Likelihood  Estimator  (MLE) 

For  our  data  (2)  the  likelihood  function  is  given  by 

J  vv 

i«l 

where  f  is  given  in  (1)  and 

W 

F_(x)  -  X*f_( t)dt  -  l-e"X/9.  (3) 

e  oo 

Substituting  (1)  and  (3)  into  the  likelihood  function  and  taking  logarithms 
yields 

log  likelihood  ■  p  log  <l-e~L/9)  -  K  log  0  -  Ex/0.  (4) 


Maximizing  the  expression  (4)  yields  the  conditional  MLE  0*. 


Proposition  1.  The  conditional  MLE  0*  exists  and  is  the  unique  root  of 


It  is  consistent,  asymptotically  efficient,  and  has  asymptotic  variance 
a  K 

N'1[n^UiTiT+  ®a(exP(“L/e)“exP(~xi/e)  )+®2  X  (•*P<-*j_1/6)-e*P(-*j/0))  1 

Proof  Follows  from  Kulldorff  (1961,  Theorems  11.1  and  11.2). 


The  estimator  6*  is  biased.  By  using  a  simulation  (see  section  2 
below)  we  may  view  the  extent  of  the  bias. 

Examples .  Let  L«l,  6*2,  N»15.  Suppose  £x  =*  K(L+0),  its  expected  value 
(see  Prop.  A.4(f ) ) . 

Then  p  *  4  gives  0*  «  2.32 

p  »  5  2.15 

p  »  7  1.81. 


II.  Modified  MLE 


The  MLE  0*  is  extremely  biased  for  small  samples. 
0*  tries  to  estimate  the  average  value  of  all  the  data. 


The  problem  is  that 
Let 


A 


E( average  all  data 


lp>  • 
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Prom  Proposition  A. 4(f),  we  see  that 


0+L  - 


N  ,  -L/0 

1-e 


which  is  not  0.  So  we  suggest  modifying  0*  to  form  the  estimator  0o 
which  will  satisfy  the  following  implicit  formula: 


0* 


0o  +  L  - 


£  L . 

N  ,  -L/0o 

1-e 


(; 


By  using  our  simulation  <see  section  2  below),  we  see  that  0o  is  a  very  good 
estimator.  Further,  if  we  replace  lx  by  its  expected  value,  we  see  that 
0o  is  close,  but  not  equal,  to  0.  Consequently,  it  does  have  some  small 
bias . 

Examples.  Let  L»l,  0-2,  N-15.  Replacing  lx  by  its  expected  value,  we 
obtain: 

o  -  A  gives  0o  -  2.002 
p  -  5  2.001 

p  »  7  1.996. 

III.  Best  Linear  Invariant  Estimator  (BLIE) 

We  let 


K 

H  -  l  C.X.  (9 

i-1  1  X 

be  an  arbitrary  linear  estimator.  We  wish  to  select  those  coefficients 
{C^,..,^}  which  minimize  the  variance  of  H  among  all  invariant  H, 
i.e.  all  H  satisfying 


E(H/0)  -  constant. 


<# 


I* 


Prom  Prop.  A. 4,  we  have 

E(H)  -  ZC.E(x.) 

l  l 

-  EC. (L+E. 6) 

l  l 

-  LIC.  +  e£C.E. .  (10) 

1  11 

Thia  is  a  constant  times  0  if  and  only  if  EC^-0.  Hence,  invariance  of  H 
is  equivalent  to 

EC.  *  0.  (11) 

l 

So  we  wish  to  minimize  E(H-d)3  subject  to  (11).  Now  from  Prop.  A. 4  we  have 

E(H-8) 2  -  E( EZC.C  ,x. x .-28EC. X. +0a) 
i  j  i  j  i  i 

-  EEC.C.(D. .62+E.E.e2+LE.e+LE.e+02) 
i  3  ij  i  j  i  j 

-20ECi(L+Ei0>+02 

»  ElCiCj(Dij02+EiEj02>-20EC.E.+02  (12) 

where  we  have  used  (10).  So  our  problem  is  to  minimize  (12)  subject  to  (11). 
Using  a  Lagrange  multiplier  X,  our  problem  is  to 

min  L  »  min  ( ZZC^Cj (D^ j+E. E^ )0a-20EC. E. -2XEC^ ) .  (13) 

Setting  3L/3C^  ■  0,  each  i,  yields 

0  -  ZC.(D. ,+E, E , )0a-0E, -X  (14) 

j  3  ij  i  j  i 

and  3L/3X  *  0  yields  (11).  The  solution  to  our  system  (11)  and  (14)  is 
C1  -  -  1  +  1/K 
Cj  -  1/K,  1  »  2, . . .  ,K 

X  -  -  0*/Ka  +  L0. 


The  verification  requires  use  of  Corollary  A. 3.  Hence  the  BLIE  is 

H*  -  -  x1  +  £  lx. 

We  how  have  the  following  result. 

Proposition  2.  The  BLZE  is  (for  N-p  -  K  >  2) 

H*  -  -  Xl  +  £  £x.  (15) 


It  satisfies 

K-l 

EH*  -  0 


var  H*  -  0a/K. 


Proof .  We  note  that 

H* 


K 

K-l 


So  from  Corollary  A. 6,  H*  is  distributed  as  ~  x*(2(K-l)).  Hence  (16) 
follows.  To  obtain  (17)  we  have 

var  H*  -  E(H*  -  0)a  +  bias2 

9 _  .  y  _  _  .  .  K**l  _  .  2  2 

*  77*  4(k-d  ♦  (—  -  i)  e 

4K  K 

-  ea/K. 


(16) 

(17) 


IV.  Best  Linear  Unbiased  Estimator  (BLUE) 


We  let 


K 

l  B.x 


i  i 


i-1 


(18) 


be  an  arbitrary  linear  estimator.  We  wish  to  select  those  coefficients 
{Bl,..,B|(}  which  minimize  the  variance  of  G  among  all  unbiased  G. 

Prom  (10)  we  have 


EG  -  LIB.  +  018. E. . 

l  i  l 


Since  we  require  EG  »  9,  we  require 


ZBi  -  0  (19) 

ZB.  E.  -  1.  (20) 

Hence  our  problem  is  minE(G-0)a  subject  to  (19)  and  (20).  Now  from  (12) 

E(G-0) 2  •  ZZB.B  D..0a  +  20a  -  20.  (21) 

x  j  ij 

Using  Lagrange  multipliers  X  and  v>,  our  problem  is  to 

min  L  ■  min  (ZZB.B.D. .0a  -  2XZB.-2y(ZB.E.-l) ) .  (22) 

i  3  ij  l  i  l 

Setting  3L/3B^  -  0,  each  i,  yields 


0  -  ZBjDij0a-X-uEi, 

3L/3X-0  yields  (19),  and  3L/3y*0  yields  (20).  The  solution  to  our 
system  (19),  (20),  and  (23)  is 

B!--l 

Bj  -  1/(K-1) ,  j«2,3,..,K 

X  -  0a/K(K-l) 

V  -  -0a/(K-l). 


Hence  the  BLUE  is 

G* 


K-l  X1 


+ 


-1_ 

K-l 


Zz. 


We  have  the  following  result. 

Proposition  3.  The  BLUE  is  (for  N-p  ■  K  >  2) 


(23) 


e 


It  is  distributed  «s  JJYTy  X*  (2(K-1)). 


Consequently, 


ver  G*  -  0/OC-l). 


Also  the  BLIE  H*  satisfies 


Proof .  See  Corollary  A. 6. 

V.  Pill-in  with  Constants  Approach 

Various  constants  have  been  suggested  as  proxies  for  the  data  below  L. 
Pessimists  might  use  zero  (i.e.  equipment  failed  immediately)  while  optimists 
might  argue  for  L  (i.e.  equipment  failed  at  the  instant  we  started  checking). 
Those  suggesting  some  sort  of  balance  might  use  L/2.  Let  us  suppose  that  we 
use  the  value  C  as  a  proxy.  Then  our  estimator  is 

0*  -  J  (lx  +  pC) .  (26 

This  procedure  is  very  easy  to  use  and  is  easily  understood  by  the 
statistically  non-sophisticated. 

Clearly  the  rule  0*  is  biased.  In  fact 
E0*  »  jk  (K(L+0)  +  pC) 

»  0+L-J  (0+L-C)  (27 

using  Proposition  A. 4(e).  Consequently 

var  0*  ■  var  (Ex) 

N 

K  a  (28 

■  — y  0 
N 


from  Corollary  A.S,  a  relatively  small  value  since  a  (sometimes  large)  part  of 
the  data  is  replaced  by  a  fixed  constant.  Hence  9*  has  a  very  narrow  spread 
about  the  wrong  value! 

To  improve  this  technique,  we  suggest  that  9*  is  trying  to  estimate 


Our  simulation  (see  Section  2  below)  shows  that  9o  is  a  much  improved 
estimator. 


VI.  Fill-in  with  Expected  Values 

Let  us  fill  in  the  missing  data  not  by  constants  as  in  V  above  but  by 
more  appropriate  values:  their  expected  values.  From  Proposition  A. 4(d)  we 

have 

E(sum  of  missing  data  |p)  »  p(9  -  ~ — ).  (30) 


Hence  our  estimator  9*  satisfies  the  equation 

®*  *  i  »**>«>’  -  77^“  11  • 

e  —1 


(31) 


After  rearranging,  this  equation  is  identical  to  (5),  the  equation  for  the 
MLE!  Consequently,  the  MLE  procedure  is  equivalent  to  filling-in  the  data 
points  "below  L"  with  their  conditional  expectations.  This  interpretation 
adds  credance  to  our  suggestion  in  II  that  the  MLE  needs  to  be  adjusted  via 
the  procedures  outlined  there. 


VIZ.  Truncation 


Our  last  technique  is  very  easy  to  conceptualize:  forget  that  data  below 
L  has  been  obtained  and  assume  that  the  distribution  of  the  remaining  K  data 
points  are  governed  by  the  truncated  exponential  distribution 


*e(x) 


1  -x/e 

Se 

7  \  e-t/edt 

L  6 


1  (L-x)/6 

Q  e 


for  x>L. 


All  "good”  estimators  (i.e.  MLE,  BLUE,  Minimum  Variance  Unbiased 
Eatimator)  for  our  truncated  distribution  (32)  are  the  same: 


(32) 


0*  -  Ix-L.  (33) 

It  has  the  following  properties. 

Proposition  A.  The  truncated  estimator  0*  is  distributed  as 

§£  xa(2K). 

As  such,  it  is  unbiased  with  variance 

var  0*  -  0a/K, 

which  is  smaller  than  that  of  the  BLUE. 

Proof .  See  Corollary  A. 6(a). 


SECTION  2.  THE  SIMULATION 

In  order  to  evaluate  the  performance  of  the  estimators  based  on  maximum 
likelihood  procedures  (parts  I  and  II  of  section  1>  and  on  filling-in  by 
constants  (part  V  of  section  1),  we  performed  a  simulation.  Since  the 
exponential  distribution  has  only  a  scale  parameter  9  to  estimate,  all  the 
formulas  depend  only  on  the  ratio  0/L.  Thus  we  were  free  to  normalize  the 
simulated  data  to  the  case  L-l,  9-1/3,  2/3,  1,  2,  3,  and  5.  We  selected 
N-5,  10,  and  15  as  representative  small  data  set  sizes. 

Using  a  standard  pseudo-standard  generator,  we  simulated  20,000  data  sets 
for  each  value  of  N  and  9.  The  data  sets  were  then  artifically  censored  at 
the  cutoff  L-l  and  passed  to  the  several  estimators  to  ’’guess'’  values  for 
9.  The  data  sets  were  then  grouped  by  p,  the  number  of  missing  data  values, 
and  averaged.  Typical  results  are  included  in  Tables  1  and  2  below.  The 
tables  include  the  method  of  truncation  (part  VII  of  section  1)  as  a  means  to 
check  the  simulation  since  the  mean  and  variance  for  this  method  have  been 
theoretically  calculated  in  Proposition  4.  We  can  clearly  see  that  the 
theoretical  values  agree  quite  well  with  those  obtained  in  the  simulation. 

(Insert  Tables  1,2  about  here.) 


CONCLUSION 


We  have  presented  above  several  methods  to  estimate  0  based  on  censored 
from  below  data  sets.  Several  are  extremely  simple,  easily  calculated  and 
understood  by  the  mathematically  unsophisticated.  Among  these,  the  method  of 
truncation 


9*  -  Ex-L 
N-p 

is  clearly  the  best.  It  is  unbiased  and  has  small  variance 

var  0*  ■  0a/(N-p). 

Simulated  data  shows  that  it  performs  just  about  as  well  as  predicted. 

For  the  more  sophisticated  worker  who  will  use  a  computer  to  find  an 
estimator  the  modified  MLE  (part  II  of  Section  1)  appears  to  be  slightly 
superior.  It  is  found  via  a  two-step  procedure:  0o  satisfies 


0o  +  L  -  P/N  — ^rrr  -  O* 
,  -L/0o 

1-e 


where  9*  satisfies 


0* 


Ex  - 


JEk. 


L/0* 
(N-p) (e  -1) 


(N-p) 

It  appears  to  have  little  bias  with  slightly  smaller  variance  than  the 


method  of  truncation. 


APPENDIX 


In  this  section  we  investigate  the  distributions  of  various  random 
variables  associated  with  our  estimators. 


Proposition  A.l  Let  Y  , ...Y  be  the  order  statistics  for  an  exponential 
distribution  with  parameter  6,  i.e.  Y^,...YR  are  a  random  sample 
arranged  in  ascending  order  O^Y^Y^i* •  *<YK. 

Then 

a.  Y^  has  an  exponential  distribution  with  parameter  0/K. 

b.  Yj+j-Yj  has  an  exponential  distribution  with  parameter 

e/OC-j),  j-1 . K-l 

c.  1^  are  *n<*ePendent. 

Corollary  A. 2  For  the  situation  described  in  Proposition  1, 


8<V  '  (k*  r-j.i 


>6 


b.  Cov(Y. Y. )  -  (<i)2+  (~r) 2  +••+  < 


i  j 


’K-l' 


— >a)02 


K-m+1 


where  m  «  min<i,j) . 
We  let 


I  1 

K  ’+  R-j+1 


0!K): 

1J 


/I*  *  ,  1 

<-)  +. .+  (~ - r 

K  K-m+1 


_<K) 

D. 

3 


j_1  1  2 
I  (— )  . 

4  vK-n; 
n-o 


Corollary  A. 3  For  the  quantities  defined  above,  we  have: 
K 


a.  I  D 


(K)  „(K) 


j-1 

K 


ij 


i 


I  e: 

i-1 


(K) 


c.  K. 


Q 


Proof.  Suppressing  the  superscripts,  we  have  the  following  calculations, 

K  i  K 

*•  X  D  -  l  D  +  I  D 
j»l  IJ  j«l  J  j»i+l  lJ 

i  K 

-  X  Dj  ♦  ID. 

j-i  j-i+l 

i 

-  I  D.  +  (K-i)D. 

j-1  J 

*  lK>’  *  ;<K,J+  'm’”1  *  ‘d’’*"*  ‘id?’1  *  <K-i>Di 

l3  l3  l3  l3 

-  i<£)  +<  i-1)  (j~)  +(  i-2)  <£“£)  +  ..♦  K^ITT^)  +(K-i)D. 


i-1 


i-1 


I  (i-nHr1-)  ♦  <  K-i )  I  (n~) 

N-n  N-n 


n»o 


i-1  ,  2 

l  <X-nM:dr> 
k— n 
n»o 


n*o 


b-  I  E 
i«l 


i-1 

I  -Js~ 

L  K-n 
n»o 


.1,  /I  1  ,  ,1  _1_  1, 

(^)  +  (^  +  gZ7^+,-+  +  k-1  +  ’,+i 

K  <£)  +  (K-D(^)  ♦..+(!)  7 


K. 


We  suppose  in  the  remainder  of  this  section  that  0<L<®  is  a  fixed, 
given  (known)  constant. 


Proposition  A. 4  Let  be  the  order  statistics  from  an  exponential 

distribution  with  parameter  9.  Suppose  Y^<L  <  Y^+1  an<*  ^et  K*N-p.  Let 


X,  -  Y  , 
j  P+j 


Then  a.  {X.-L}  are  the  order  statistics  from  an  exponential  distribution  with 

J 

parameter  9. 


b.  E(X, | p)  -  e!K>9  +  L 

J  <3 

c.  cov(X.Xj  |p)  »  dJJV 


d.  E(Y1+..+Y  |p)  »  p[9  - 


eL/0-l 


e.  8<X.  +  .  .+XK|p)  «  (M-pXL+9) 


f.  E(Y1+..+YN|p)  -  N < L+9 )  - 

1-e 


Proof  d.  E(Y^+. .+Yp|p)  =  E  (sum  of  p  independent  samples  each  less  than  L) 
■  pE(X<L) 


1-e 


-L/9 


p[9- 


L/9  , 
e  -1 


1. 


e.  follows  from  Corollary  3(b)  and  part  (b). 

f.  follows  from  parts  <d)  and  (e). 


<• 


e 


Corollary  A. 5  For  the  situation  described  in  Proposition  A. 4,  we  have 

var  (£X|p)  *  K0? 

Proof  var(£x|p)  -  E((IX)2(p)  -  < E ( Jx | p ) ) a 

-  EtZlX.Xjlp)  -  (K(L+0)>2 

-  IltD^e2  +  (L+E^0)  (L+Ej0) )  -  (K(L+0))2 

-  xe2  +  (K(L+e>)2  -  (K(L+e) )? 


Corollary  A. 6  For  the  situation  described  in  Proposition  A. 4,  we  have 

a.  (JlX  -  L)  ~  x2  (2K) 

b.  (-  —  Xx  +  —  IX)  ~  2(JC_1)  X  <2(K-1)> 

Proof.  Let 

Sx  >  K(Xj-L) 

Si  •  ( K-j+1) (Xj-Xj_1>  j-2,..,K. 


Then 


E( Sj )  -  (K-j+1) <EXj -EXj_1> 

*  (K-j+1)  (Ej  -  Ej  ^) 

-  ( K-j+1 ) 0/ ( K- j  +1 ) 

-  0. 


Thus  {S./O}  are  i.i.d.  exponential  1.  Hence  2S./0  ~  x*(2)  and  so 
J  J 


l  S  -  J  X2  (2(K-T+1)) 
j-T  J  1 


and 


l  s,  - 


K-T+l  ,%  j  2(  K-T+l) 
j«T 


X  ( 2 ( K-T+l) ) 


Part  (a)  is  the  case  T«l;  part  (b)  is  the  case  Ta2. 


Table  1.  Simulation  Results  for  N=5,  True  Mean  -  2/3,  L*la 
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APPENDIX  B 


ESTIMATION  OP  THE  NORMAL  POPULATION  PARAMETERS  BY 
ORDER  STATISTICS  GIVEN  A  SINGLY  CENSORED  TYPE  I  SAMPLE 


Alan  Gleit* 

Varsar  Inc. 

6850  Versar  Center 

P.  0.  Box  1549 

Springfield,  Virginia  22151 


This  work  was  sponsored  by  the  Air  Porce  Office  of  Scientific  Research 
under  contract  P49620-82-C-0079 . 


We  construct  the  Best  Linear  Unbiased  Estimators  for  the  mean  and 
variance  given  a  Type  I  censored  sample  from  a  normal  population. 
Numerical  experience  with  ssmII  data  sets  indicates  that  our  iterative 
procedure  to  find  the  estimators  als»st  never  converges. 
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INTRODUCTION 


The  problem  of  estimating  the  parameters  from  a  censored  normal 
distribution  has  been  extensively  treated  in  the  literature.  Two  natural 
censoring  mechanisms  are:  (1)  observations  below  or  above  a  given  point 
may  be  missing  (Type  I)  and  (2)  the  p  smallest  or  largest  observations  of 
a  sample  of  size  N  may  be  missing  (Type  II).  Type  I  censoring  is  more 
complex  since  the  number  of  observations  K  ■  N  -  p  is  a  random  variable. 
Consequently.  Type  II  censoring  methodology  has  been  applied  to  Type  I 
data  though  the  methods  are  clearly  biased. 

One  widely  used  method  to  estimate  the  parameters  of  a  normal 
distribution  is  based  on  linear  combinations  of  order  statistics.  For 
Type  II  censoring,  the  known  sample  elements  are  arranged  in  ascending 
order,  i.e.,  X(l)  <  X(2)  <  •••  <  x(K) ,  and  the  method  of  least  squares 
is  applied  to  get  the  best  linear  combination  of  them.  The  coefficients 
provided  by  these  linear  estimators  are  unbiased  (if  K  is  known  a  priori) 
with  minimal  variance.  Important  contributions  to  this  methodology 
include  Gupta  (1952).  sarhan  and  Greenberg  (1956,  1958),  Law  (1959)  and 
Dixon  (1960).  Below  we  extend  this  methodology  to  the  case  of  Type  I 
censoring. 


SECTION  1.  CONDITIONAL  ESTIMATORS 


Let 


*1  ~  X2  ~  ***  ~*K 

be  the  ordered  censored  senile  of  size  K  out  of  a  complete  sample  of  size 
N.  We  assume  that  the  p  «  N  -  K  censored  values  are  known  to  lie  below 
the  censoring  value  L.  We  will  first  develop  the  minimal  variance 
unbiased  linear  estimator  (BLUB)  conditional  on  p.  Later  we  shall  remove 
this  conditioning. 

Our  initial  problem  is  to  find 

0  -  l  OX  (1) 

5  3  3 

and 

H  -  l  H  X  (2) 

3  3  3 

with  8(0)  *  y,  B(H)  *  a,  and  minimal  variance  among  such  linear  estimators. 
To  better  formulate  one  problem,  let 

B(l.R.K)  expected  value  of  the  1th  order  statistic  from  groups 
of  size  K  for  a  standard  normal  random  variable  censored  from 
below  at  R. 

cov(i.j.R.K)  ■  expected  value  of  the  covariance  of  the  ith  and 
jth  order  statistics  from  groups  of  size  K  for  a  standard  normal 
random  variable  censored  from  below  at  R. 


Using  this  notation  we  have 


Hence 


B(X^)  -  y+o  B(3 . (L-y)/o,K) 

2 

cov(XltXj)  ■  a  cov(i, J , (L-y)/o,K) . 

BO  ■  SOj(y+oB(j.U-y)/o,K)) 
varO  ■  o^2XGjG^cov(i,3 . (L-y)/e.K) 


(3) 

(4) 

(5) 


(6) 


with  similar  formulas  for  H.  Since  EG  should  be  p,  we  obtain  from  (5): 


iG^Btj.a-uJ/o.K) 


0. 


Thus  we  may  formulate  our  problem  as  follows. 

Proposition  1.  The  BLUE'S  for  p  and  a  solve  the  following  problems. 

Pis  min  ZlG<G.,cov(l.J.(L-iO/0.K) 

13  1  3 

such  that  £g^  -  1  (7) 

lG3E(j.(L-v)/o.K)  -  0  (8) 

and 

P2:  min  XJH  H  cov(i. j . (L-p)/o,K) 

13  1  3 

such  that  *  0  (9) 

Xh^b(3.(L-v)/o.K)  -  1.  (10) 

To  solve  PI  and  P2.  let  us  first  write  them  using  the  equivalent 
Lagrangean  formulation: 

PI'S,  min  XlGjGj  cov  (  i  ,J .  ( L-p  )  /  o .  K )  -  o^tXo  jl>  -  B^G^EC  J .  (L-p)/o.K) 
P2's  min  X2HiH^COV(i,3,(L-p)/0,K)  -  a^H^-B^XH^BU.  (L-p)/o,K)-l)  . 


To  write  down  the  solution  to  PI'  and  P2‘,  we  first  introduce  some 
additional  notation.  Let 

X(R.K)  *  matrix  with  (1,3)  coordinate  equal  to  cov  (i.J.R.K) 

B(R.K)  -  vector  with  1th  coordinate  equal  to  B(1,R,K) 

1  *  vector  of  length  K  whose  elements  are  all  the  number  one. 


Finally  let 
H(R,K)  ■ 

Then  the  solutions  to  PI'  and  P2'  are 


X<R.K)  -1'  -B'(R,K) 

10  0 

B(R.K)  0  0 


G 


with  optimal  variances  «n  and  B2. 


(11) 


(12) 


Unfortvinately,  the  optimal  linear  combinations  o  and  H  depend  on  the 
unknown  parameters  y  and  o!  Hence  our  estimates  Cor  y  and  o  need  to 
be  found  by  iterative  procedures. 


The  estimates  y*  and  a*  provided  by  the  BLUE's  satisfy 


(11),  (12).  and 


o*. 


Solutions  to  these  four  equations  can  be  found  provided  extensive  tables 
of  M  (R.K)  are  available.  If  they  were,  a  useful  algorithm  is 
fairly  straightforward : 


1.  Let  R  »  0 
o 


0 


Let 


1+1 


1+1 


1+1 


lo 

SH 


I+l.j 


1+1.3 

<L’WI+1 


X 

X 


3 

3 


)/a 


1+1* 


IE  l'‘I"wI+il  a**1  |a  -oI+1l  are  sma11  enough,  stop. 
Otherwise,  let  — 1+1  and  go  to  step  2. 


SECTION  2.  UNCONDITIONAL  ESTIMATORS 


We  now,  after  considerable  effort,  have  y*.  0*  which  clearly 
depend  on  K.  To  obtain  unconditional  estimators  we  need  to  find  the 
expected  value  of  y*,  o*.  For  this  purpose,  we  note  the  following. 

Proposition  3. 


1. 

*((L-u)/o) 

BUI  X>D  •  »♦«  l-iict-iJ/.) 

(13) 

2. 

B(x|  «<L)  - 

(14) 

3. 

BU  llJL)  -  «  +.  +(LU)= 

(15) 

4. 

B(x  |X<L)  -  y  +0  -(L+y)0 

(16) 

Letting 

.  4»((L-u)/o) 

(17) 

l-4((L-y)/0) 

and 

«t».LCLZk)/o) 

B  #((L-y)/o) 

(IB) 

and  using  (13)-(16)  we  may  easily  obtain  the  following. 


corollary  4. 

1.  B(sum  of  all  N  data  |p  missing)  *  Ny+o(KA-pB) 

2.  B(sum  of  squares  of  all  N  data  |p  missing)  * 

2  2 

N(y  +0  )+(L+y)o[KA-pB] 

3.  B((sum  of  all  N  data)2|p  missing)  * 

2  2 
K(K-l)<u+0A)  +2pK(y+oA)(y-oB)+p(p-l)  (y-oB)  + 

2  2 

+N(y  +o  )+(L+y)0(KA-pB). 


Corollary  5. 

1.  E  (average  of  all  N  data|p  missing)  *  p+o(KA-pB)/N  (19) 

2.  E(S^  for  all  N  data ( p  missing)  ■ 

2  2  2 
p  +a  +  (L+p)c(KA-pB)/N  -  K(K-l)(p+oA)  /N(N-l) 

-2pK(p+oA)(p-0B)/N(N-l)  -  p(p-l)(p-oB)2/N(N-l) .  (20) 

2  2 

Hence  the  expected  values  of  p*  and  a *  are  not  v  and  a  but  the  expressions 

on  the  right  hand  sides  of  (19)  and  (20).  Another  iterative  scheme  would 

2 

convert  the  biased  p*  and  a*  to  their  unbiased  counterparts. 


SECTION  3.  NUMERICAL  EXPERIENCE 


To  perform  the  calculations  indicated  in  Section  1,  rather  extensive 
tables  of  M  1(R,K)  would  be  required.  In  turn,  to  find  the  inverse  for 
the  matrix  M(R.K)  we  would  need  to  find  the  expected  values  S(R.K)  and 
the  variance  I(R.K)  for  groups  of  K  order  statistics  from  a  standard 
normal  distribution  with  censor  value  R.  To  find  these  we  generated  one 
million  normal  variates.  We  then  grouped  all  those  values  greater  than 
R  into  groups  of  size  K.  The  means  and  covariances  were  computed  as  the 
averages  of  the  values  for  each  such  group.  Tables  were  prepared  for 
R  »  -3.0,-2.0( .l)+2.0,+3.0  and  K-l(l)15.  As  an  example,  we  show  in 
Tables  I,  II,  III  below  the  output  for  R*.l.  K-1,2,3.4,5.9,  and  14.  For 
groups  of  14  values  from  the  standard  normal  distribution  all  above  the 
value  R».l,  the  first  (J-l)  order  statistic  has  expected  value  0.1763,  the 
second  (J*2)  order  statistic  has  expected  value  0.2549,  etc.  Further,  the 
variance  of  the  first  (J*l)  order  statistic  is  0.0053,  of  the  second  (J*2) 
order  statistics  is  0.0103,  etc.  Also,  the  covariance  of  the  second  ( J»2) 
and  first  (1*1)  order  statistics  Is  0.0049,  etc. 

For  each  possible  (R,K)  combination  M(R,K)  and  thence  M  1(R,K) 
was  found.  For  values  of  R  not  in  our  tables,  linear  interpolation  was 
used. 

To  test  the  value  of  our  estimators,  we  used  various  combinations  of 
v  and  a.  By  adjusting  the  range  and  scale,  we  chose  L»1  and 

y*1.33;  a*. 2, .3 
y*1.00;  o*.l,.2,.3 
y*.67;  o».2,.3 

with  total  sample  sizes  N«5,  10,  and  15.  For  each  (y,  a,  M) 
combination  we  generated  50,000  samples,  censored  them  at  the  value  L*l. 
and  tried  our  algorithm  on  the  resulting  data  sets.  In  the  algorithm  of 


Section  1  we  let  “small"  be  .01  and  stopped  if  R  was  ever  out  of  range. 
Unfortunately,  in  no  (y,  a,  K)  instance  did  even  10%  of  the  samples 

1-u* 

converge  for  -3<  *  <+3;  in  fact,  for  only  y*l.  o».l  did  even  5% 

coverage ! 

Hence,  the  methodology  described  above,  though  theoretically  useful, 
has  little  practical  value. 


TABLE  I.  Expected  values  of  standard  normal  variates 
above  the  value  0.1  in  groups  of  K*  1,2, 3, 4,  and  5 


TRUNCATION 

VALUE  • 

0.1 

MEAN  OF 

MEAN  OF 

MEAN  OF 

MEAN  OF 

X 

X**2 

X**3 

X*44 

COVARIANCE 

K-  1 

J- 

i 

I-  1 

0 • 8616 

1.0847 

1.7301 

3. 25E*0 

0.3425 

Sun 

OF 

mean: 

0.8616 

K-  2 

J- 

1 

I-  1 

0.5380 

0.4209 

0.4256 

5.14E-1 

0.1315 

J- 

2 

I«  1 

1.1768 

0. 7357 

0.1027 

2 

1.7346 

3.0215 

5.99E*0 

0.3501 

Sun 

OF 

MEAN: 

1.7147 

K-  3 

J- 

1 

I-  1 

0.4134 

0.2442 

0.1887 

1.77E— l 

0.0733 

J- 

2 

I«  1 

0.7911 

0.3893 

0.0623 

2 

0.7818 

0.9159 

1.23E+0 

0.1560 

J* 

3 

I-  1 

0.6165 

0.0502 

2 

1.3700 

1.2093 

0.1255 

3 

2.2069 

4.0492 

8.27E+0 

0.3302 

sun 

OF 

MEAN: 

2.5745 

K»  4 

J* 

1 

I-  1 

0.3438 

0.1643 

0.1023 

7.74E-2 

0.0461 

J* 

2 

I«  1 

0.2535 

0.0408 

2 

0.6186 

0.4759 

0.4341 

4.53E-1 

0.0933 

J* 

3 

!•  1 

0.3671 

0.0350 

2 

0.9661 

0.6778 

0.0803 

3 

1.0926 

1.4049 

2.01E+0 

0.1595 

J- 

4 

I  ■  l 

0.5472 

0.0286 

2 

0.9979 

0. 0b49 

3 

1.5085 

1.5845 

0.1273 

4 

2.5878 

4.9481 

1.04E+1 

0.3124 

sun 

OF 

mean: 

3.4370 

K-  5 

J« 

1 

I-  1 

0.3007 

0.1227 

0.0647 

4.16E-2 

0.0323 

J" 

2 

1-  1 

0.5172 

0.1841 

0.0286 

2 

0.3314 

0.2523 

2.20E-1 

0.0640 

J- 

3 

i«  i 

0.2575  . 

0.0257 

2 

0.7711 

0.4561 

0.0573 

3 

0.6970 

0.7181 

8.26E-1 

0.1024 

J- 

4 

I-  1 

0.3513 

0.0223 

2 

0.6148 

0.0489 

3 

0.9317 

0.0879 

4 

1.0944 

1.3557 

1.8653 

2. 81E*0 

0.1581 

4- 

5 

I-  1 

0.5038 

0.0182 

2 

0.8747 

0.0395 

3 

1.3156 

0.0703 

4 

1.8963 

0.1290 

5 

1.6150 

2.9093 

5.7664 

1.24E+  1 

0.3012  ' 

sun  of  mean * 


4.2984 


TABLE  II.  Expected  values  of  standard  normal  variates 
above  the  value  O.i  in  groups  of  K”9 


TRUNCATION 

MEAN  of 

MEAN  OF 

X 

X442 

9  J* 

1 

I- 

1 

0.2161 

0.0581 

J* 

2 

I» 

1 

0.0840 

2 

0.3389 

0.1383 

J- 

3 

I- 

1 

0.1108 

2 

0.1797 

3 

0.4659 

0.2515 

J» 

4 

I- 

1 

0.1399 

2 

0.2249 

3 

0.3130 

4 

0.6035 

0.4095 

J- 

5 

I- 

1 

0. 1722 

2 

0.2752 

3 

0.3819 

4 

0.4984 

5 

0.7564 

0.6315 

J» 

6 

I- 

1 

0.2091 

2 

0.3328 

• 

3 

0.4607 

4 

0.5999 

5 

0.7580 

6 

0.9307 

0.9409 

J- 

7 

I- 

1 

0.2536 

2 

0.4020 

3 

0.5557 

4 

0.7230 

5 

0.9119 

6 

1.1293 

7 

1.1405 

1.4000 

J» 

8 

I- 

1 

0.3113 

2 

0.4923 

3 

0.6796 

4 

0.8833 

5 

1.1122 

6 

1.3751 

7 

1.7000 

8 

1.4127 

2.1374’ 

J- 

9 

I- 

1 

0.4116 

2 

0.6478 

3 

0.8926 

4 

1.1590 

5 

1.4574 

6 

1.7992 

7 

2.2163 

8 

2.7722 

9 

1.8779 

3.7899 

VALUE  •  0.1 


MEAN  OF 

MEAN  OF 

X4*3 

X*44 

COVARIANCE 

0.0194 

7.84E-3 

0.0115 

0.0108 

0 . 0668 

3.73E-2 

0.0235 

0.0101 

0.0218 

0.1544 

1.06E— 1 

0.0345 

0.0095 

0.0204 

0.0318 

0.3075 

2.52E-1 

0.0453 

0.0088 

0.0189 

0.0295 

0.0419 

0.5755 

5.67E-1 

0.0594 

0.0081 

0.0174 

0.0271 

0.0383 

0.0541 

1.0251 

1.20E*0 

0.0749 

0.0072 

0.0154 

0.0244 

0.0348 

0.0492 

0.0680 

1.8372 

2.56E+0 

0.0994 

0.0062 

0.0135 

0.0216 

0.0308 

0.0437 

0.0606 

0.0890 

3.4460 

5.89E*0 

0.1420 

0.0059 

0.0113 

0.0178 

0.0258 

0.0370 

0.0517 

0.0747 

0.1197 

8.1844 

1.88E+1 

0.2637 

SUN  OF  MEAN: 


7.7426 


TABLE  III.  Expected  values  of  standard  normal  variates 
above  the  value  0.1  in  groups  of  K*14 


TRUNCATION 

VALUE  • 

0.1 

• 

MEAN  OF 

HEAN  OF 

MEAN  OF 

NEAN  OF 

X 

X**2 

X**3 

X**4 

K-14  J- 

1 

I- 

1 

0.1763 

0.0363 

0.0089 

2.58E-3 

J- 

2 

I- 

1 

0.0499 

2 

0.2549 

0.0753 

0.0256 

9.96E-3 

• 

J- 

3 

I- 

1 

0.0637 

2 

0.0951 

3 

0.3349 

0.1271 

0.0543 

2.58E-2 

J-  4  I-  1 
2 

3 

4 

J-  5  I-  1 
2 


J-  6  I-  1 
2 


J-  7  I*  1 
2 


J-  8  I-  1 
2 


J-  9  I-  l 
2 


J-10  1< 


0.4209 


0.5074 


0.6003 


0.6980 


0.8037 


0.9178 


1.0466 


0.0787 
0.1166 
0.1551 
0.1971 
0.0937 
0.1383 
0.1834 
0.2325 
0.2821 
0.1099 
0.1616 
0.2139 
0.2706 
0.3278 
0.3901 
0.1269 
0.1860 
0.2458 
0.3107 
0.3759 
0.4469 
0.5223 
0.1452 
0.2124 
0.2805 
0.3542 
0.4283 
0.5086 
0.5941 
0.6872 
0.1651  * 
0.2409 
0.3160 
0.4011 
0.4646 
0.5755 
0.6716 
0.7764 
0.6907 
0.1876 
0.2733 
0.3605 
0.4544 
0.5467 
0.6514 
0.7597 
0.8777 
1.0062 
1.1530 


0.1019 


0.1706 


5.77E-2 


1.  HE— 1 


0.2725 


2.03E-1 


0.4166 


3.52E-1 


0.6221 


5.93E-1 


0.9105 


9.76E-1 


1.3322 


1.61E+0 


COVARIANCE 

0.0053 

0.0049 

0.0103 

0.0047 

0.0097 

0.0150 

0.0045 

0.0093 

0.0142 

0.0200 

0.0043 

0.0090 

0.0135 

0.0190 

0.0247 

0.0041 

0.0086 

0.0129 

0.0160 

0.0233 

0.0298 

0.0039 

0.0081 

0.0121 

0.0169 

0.0218 

0.0280 

0.0352 

0.0036 

0.0075 

0.0114 

0.0160 

0.0205 

0.0264 

0.0332 

0.0414 

0.0033 

0.0070 

0.0106 

0.0149 

0.0190 

0.0247 

0.0311 

0.0368 

0.0485 

0.0031 

0.0065 

0.0100 

0.0139 

0.0176 

0.0230 

0.0291 

0.0364 

0.0456 

0.0572 


TABLE  III  continued 


J-ll 

1-  1 

0.2140 

2 

0.3112 

3 

0.4103 

4 

0.5171 

5 

0.6243 

6 

0.7407 

7 

0.8636 

8 

0.9970 

9 

1.1422 

10 

1.3077 

11 

1.1992 

1.5067 

J"  12 

I-  1 

0.2464 

2 

0.3579 

3 

0.4712 

4 

0.5938 

5 

0.7169 

6 

0.8502 

7 

0.9905 

6 

1.1429 

9 

1.3086 

10 

1.4970 

11 

1.7236 

12 

1.3844 

2.0053 

J*  13 

I-  1 

0.2904 

2 

0.4212 

3 

0.5542 

4 

0.b980 

5 

0.8421 

6 

0.9984 

7 

1.1626 

6 

1.3412 

9 

1.5346 

10 

1.7542 

11 

2.0168 

12 

2.3427 

13 

1.6336 

2.7952 

J-14 

I-  1 

0.3659 

2 

0.5305 

3 

0.6979 

4 

0.8786 

5 

1.0598 

6 

1.2550 

7 

1.4608 

8 

1.6844 

9 

1.9260, 

10 

2.1995 

11 

2.5256 

12 

2.9274 

13 

3.4603 

14 

2.0644 

4.4970 

0.0026 

0.0056 

0.0067 

0.0124 

0.0159 

0.0210 

0.0267 

0.0333 

0.0417 

0.0525 

1.9768  2 • 71E*0  0.0686 

0.0024 

0.0050 

0.0076 

0.0112 

0.0145 

0.0192 

0.0244 

0.0304 

0.0381 

0.0480 

0.0637 

3.0326  4. 77E»0  0.0890 

0.0024 

0.0047 

0.0071 

0.0105 

0.0134 

0.0179 

0.0225 

0.0284 

0.0355 

0.0463 

0.0560 

0.0614 

5.0017  9.34E+0  0.1269 

0.0020 

0.0042 

0.0066 

0.0096 

0.0125 

0.0160 

0.0201 

0.0254 

0.0315 

0.0367 

0.0502 

0.0696 

0.1064 

10.3215  2.49E«1  0.2358 


SUfl  OF  HEAfc: 


12.0424 
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ABSTRACT 


We  study  normal  and  lognormal  data  characterized  by  the  dual 
problems  of  small  sample  size  with  several  values  reported  as  "smaller 
than  the  limit  L“.  In  particular,  we  propose  several  estimators  and 
report  the  results  of  a  simulation. 

Key  words:  Below  detection  limit:  MLE;  fill-in  techniques;  Type  I 
censoring;  environmental  data  analysis. 


INTRODUCTION 


The  reality  of  detection  limits  in  the  measurement  of  environmental 
phenomena  is  undeniable.  Concentrations  of  pollutants  are  quite  often 
too  small  to  measure  and  are  reported  as  "not  detectable".  For  such 
measurements,  we  know  only  that  the  concentration  lies  below  L,  the 
detection  limit. 

Thus,  the  problem  of  address  is  one  of  censoring.  In  general, 
censoring  means  that  observations  at  one  or  both  extremes  are  not 
available.  Our  problem  is  equivalent  to  “left  censoring";  life  testing 
usually  involves  “right  censoring",  i.e.,  the  largest  values  are  not 
available.  Two  types  of  life  censoring  have  received  much  attention. 
Type  I  occurs  when  the  test  is  terminated  at  a  specified  time  before  all 
the  items  have  failed;  Type  II  occurs  when  the  test  is  terminated  at  a 
particular  failure.  In  Type  I  censoring  the  number  of  failures  as  well  as 
the  failure  times  are  random  variables.  This,  of  course,  makes  Type  I 
censoring  far  more  complicated.  Consequently,  Type  II  methods  have  often 
been  applied  to  Type  I  data  with  the  hope  that  the  bias  is  not 
appreciable.  Our  problem  is  analogous  to  Type  I  censoring  since  the 
number  of  measurements,  say  p,  with  concentrations  below  L  is  a  random 
variable. 

Environmental  data  is  characterized  not  only  by  left  censoring  but 
also  by  small  sample  size.  Required  measurements  for  compliance  purposes 
often  are  performed  annually,  quarterly,  or,  at  most,  monthly  due  to  the 
expense  or  disruption  caused  by  the  testing.  Studies  of  pilot  plants  or 
demonstration  plants  are  often  of  such  short  duration  that  five  to  ten 
samples  are  all  that  are  obtained.  Thus,  methods  for  estimating  the 
parameters  of  environmental  data  using  asymptotic  or  large-sample-size 
procedures  are  usually  inapplicable. 

In  sum,  environmental  data  usually  has  the  following  characteristics 
which  make  it  difficult  to  analyze: 

1.  The  data  is  left  censored  with  a  random  number  of  data  values. 

2.  The  sample  size  is  very  small. 


Further,  environmental  data  quite  often  is  well-modelled  by  the  normal  or 
the  log-normal  families.  Consequently,  the  problem  we  address  below  is 
one  of  estimating  the  parameters  of  a  normal  or  log-normal  distribution 
when  the  data  sets  are  characterized  by  (1)  and  (2)  above. 

The  problem  of  estimating  the  parameters  of  left  censored  normal  data 
has  been  extensively  studied.  Methods  may  be  categorized  as:  (1)  maximum 
likelihood  estimators.  (2)  estimators  based  on  linear  combinations  of 
order  statistics,  and  (3)  others.  Maximum  likelihood  has  been  studied  by, 
among  others,  Cohen  (1950).  Gupta  (1952),  and  Harter  and  Moore  (1966). 
Linear  estimators  have  been  studied  by,  among  others,  Gupta  (1952),  Sarhan 
and  Greenberg  (1956,  1958),  Saw  (1959),  and  Dixon  (1960).  Other  methods 
include  a  method  of  moments  suggested  by  Ipsen  (1949)  and  the  conservative 
estimator  for  the  mean  calculated  by  replacing  all  missing  data  by  the 
truncation  point  (suggested  by  the  U.S.  Environmental  Protection  Agency). 

All  of  the  above  techniques  have  drawbacks.  The  maximum  likelihood 
procedures,  though  applicable  to  Type  X  data,  are  Inefficient  for  small 
data  sets  and  require  numerical  interpolation  in  extensive  tables.  The 
linear  estimators  are  based  on  Type  IX  censoring  and  so  are  biased  for 
Type  X  data.  Most  estimation  schema  require  extensive  tables  of 
coefficients.  The  moment  estimator  is  extremely  inefficient  for  small 
data  sets  while  the  conservative  estimator  is  extremely  biased. 

Below  we  shall  evaluate  on  simulated  data  various  estimators,  some 
new  and  some  from  the  literature,  to  determine  which  (if  any)  are 
reasonable. 


SECTION  1.  MAXIMUM  LIKELIHOOD  ESTIMATOR 


In  this  section  we  recall  the  procedures  from  Gupta  (1952).  we  assume 
0<p<N  values  of  our  N  samples  lie  below  the  (known)  limit  L.  Thus  we  are 
given  data 

{  X  ,  ....  X  .  p  values  below  L}  (1) 

l  ic 

where  we  have  taken 

K  -  N  -  p.  (2) 

We  are  asked  to  estimate  the  mean  >t  and  standard  deviation  o.  For  our 
data  (1)  the  likelihood  function  is  given  by: 


P  K 

#  (<l-v)/o)  n  *<(x.-  v)/o) 
i-1 


(3) 


where  *.  respectively  <p,  is  the  cdf.  respectively  pdf,  for  the 
standard  normal  random  variable.  Taking  logarithms  yields 


(4) 


log  likelihood  «  p  log  *  ((L-*»)/o)  -  Klogo  -  ^  I  ((x-v)/o)a. 
Maximizing  the  expression  (4)  yields  the  maximum  likelihood 
estimators  (MLB).  Setting  the  partial  derivatives  equal  to  zero  yields 

u  ,  1  7  x  .  B2  y((V-v)/g)  (5) 

“  *  1  x  K  *((L-v)/o)  K  1 


k  *  <*-»>’  -  ? 


(6) 


Let 


X  ■  ■  I  X 


(7) 


sa  -  h  l  (x  -  x)a. 


(8) 


The  procedure  to  find  u  end  a  is  then  as  follows: 


1. 

Calculate  X,  d.  S3.  and  p/K  from  the  data. 

2. 

Calculate  D  using  (19). 

3. 

Plnd  a  value  of  a  satisfying  (13),  (14).  and  (20). 
corresponding  value  for  z. 

Find  the 

4. 

Then  the  estimates  follow  from  (15)  and  (16): 

a*  »  d/z 

(21) 

v*  -  X  +  (o*3  -  S3)  /d  . 

(22) 

In  order  to  carry  out  this  algorithm  a  table  is  needed  giving  the  values  of 
z  for  a  given  pair  (D.p/K).  Tables  can  be  found  in  Gupta  (1952)  and  also 
below  as  Table  1.  Note  that  when  K«1  we  have  s»d*o  and  so  that  above 
procedures  do  not  produce  useful  results* 


SECTION  2.  TRUNCATED  MAXIMUM  LIKELIHOOD  ESTIMATOR 


Our  next  technique  is  very  easy  to  conceptualize:  forget  that  data 
below  L  has  been  obtained  and  assume  that  the  distribution  of  the  remaining 
K  data  points  are  governed  by  the  truncated  normal  distribution 


g  (X)  -  <p  ((X  -  V)  /a)  /  (1-  *  ((L  -  ii)/o)).  (23) 

The  log  likehihood  for  our  K  data  points  is 

log  likelihood  -  -K  log  (1  -  *  (a))  -  K  log  a  -  1/2  £  ((X  -  y)/o)2 
where  we  again  used  (12): 
a  ■  (L  -  yt)/a  . 

Proceeding  as  in  Section  1  we  let 

B(a)  -  v(a)/(l  -  *<a))  (24) 


z_(a)  ■  -a  +  B(a).  (25) 

T 

Then 

D  -  S2/  (S2  +  d2)  (26) 

»  (l-azT  -  zT)2  /  (1  -  azT).  (27) 


So  we  need  to  modify  our  previous  algorithm  in  Step  3  to  find  z7  for 
a  given  value  of  D.  Our  Table  2  provides  the  necessary  input. 


SECTION  3.  CONDITIONAL  FILL-IN  TECHNIQUES 


A  general  technique  for  dealing  with  missing  values  Is  to  replace 
them  with  proxies.  In  this  section  we  describe  estimators  using 
constants  or  using  expected  values  of  the  censored  values  as  proxies. 

1.  Flll-ln  with  Constants 

various  constants  have  been  suggested  as  proxies  for  the  data  below  L. 
The  United  States  Environmental  Protection  Agency  has  a  mandate  to  protect 
the  human  population  from  harmful  pollutants.  In  doing  this  It  usually 
errs  on  the  side  of  conservatism.  Thus.  EPA  often  suggests  that  all 
censored  values  be  replaced  by  the  censoring  value  L  to  obtain  the  clearly 
most  upward-biased,  l.e.,  conservative,  estimator  for  the  mean  pollutant 
levels.  For  pollutant  concentrations  the  most  liberal  policy  Is  the  one 
that  substitutes  zero  for  the  censored  data:  if  I  cannot  measure  It.  It's 
not  there.  Those  suggesting  some  sort  of  balance  might  use  L/2.  Let  us 
suppose  that  we  use  the  value  C  as  a  proxy.  Then  our  "data"  are 

{  X1 . XK'C . c*' 

Since  we  have  all  N  values,  we  would  use  the  usual  estimators  for  the  mean 
and  variance: 

v*  -  jj  (£x  +  pc)  (28) 

o*2  -  jjly  (2  X2  +  pCa  -  Nil*2)  .  (29) 

This  procedure  Is  very  easy  to  use  and  Is  easily  understood  by  the 
statistically  .un-sophlstlcated. 


2.  Fill-In  with  Randon  Order  Statistics 


As  an  alternative,  we  may  elect  to  fill-in  the  censored  data  with 
seemingly  more  appropriate  values:  their  expected  values.  To  develop  the 
formulas,  we  note  the  following. 


Proposition  1. 


1.  S  (X|  X  >  L)  -  v  +  a 


2.  B  (X|  X  <  L)  -  y  -  a 


1  -  *  ((L-y)/o) 


*  ((L-y)/o) 


3.  B  (Xa|  X  >  L)  -  ya  +  oa+  (L  +  y>  a  *-j- 


4.  B  (Xal  X  <  L)  -  ya  +  oa-  (L  +  y)  a 


1  -  #  ((L-y)/o) 


*  ((L-y)/o)  * 


Thus  the  expected  values  of  the  sum  of  the  censored  data  and  of  the  sums 
of  squares  are  p  times  the  right-hand-sides  of  (31)  and  (33),  respectively. 
Hence,  y*  and  a*  must  satisfy 

u*  •  -  [T  X  +  py*  -  vo*  ***— ~  !?-*-)-{ q* )  ]  (34) 

V  N  u  *  pv  po  *  ((L  -  v*)/o*)  1 


H-1[XX3+  Pw*  S+  P°*a  '  <L  +  °*P  ♦~('(L-y»)/Sj  '  "  V**] 


Let  us  simplify  these  expressions.  Recalling  our  previous 

definitions  (7),  (8),  for  X  and  S*  and  our  previous  notation  (12). 
(13),  for  a,  A(a),  we  find  that  (34)  and  (35)  may  be  transformed  to 


.a,  (35) 


-  X  - 


A(a*) 


0a*  -  Sa  +  A(a*)  (X  ~  L)  -  Sa  (X  -  y*)(X-L) 


Except  for  the  factor  K/(K-l)  in  (37)  these  are  identical  to  the  MLE 
estimators  (9)  and  (16)!  Consequently,  the  MLE  procedure  is  almost 
equivalent  to  filling-in  the  censored  data  with  their  conditional 
expectations.  To  numerically  solve  (36)  and  (37)  set 


v  ■  o 
o  o 


and  use  the  right-hand-sides  of  (36)  and  (37,  s  define  p 
in  terms  of  p^ ,  <jy 


J+l 


SECTION  4.  UNCONDITIONAL  ESTIMATORS 


I 


» 


I 


The  estimators  developed  in  Sections  1.  2,  and  3  are  biased.  The 
problem  is  that  they  estimate  the  parameters  conditionally  on  the 
knowledge  of  p,  the  number  of  censored  values.  In  this  section  we  will 
readjust  the  estimators  removing  the  bias  due  to  conditioning. 

1.  Fill-In  with  Constants 

Recall  that  our  N  data  values  are 

{  X\.  •••  .  XK.  C.  •••  ,  C  ). 

To  compute  the  expected  value  of  v*  and  a*  given  by  (28)  and  (29)  we 
will  use  Proposition  1  and  definition  (24)  for  B. 

Proposition  2. 

1.  E  (IX  +  pC|p)  -  K(v  +  oB)  +  pC.  (38) 


» 


2. 


B  (IX*  +  pCa|p)  -  K(u2  +  «r2+(L  +  p)oB)  +  pCa. 


(39) 


3.  B  ((IX  +  pC)a|p)  -  K(K-l )  ( v  +  «B)a  +  2pKC( ji  +  oB)  +  p(p-l)Ca 
+  K(ji2  +  <ja  +  (l  +  j«)oB)  +  pCa. 


(40) 


Corollary  3. 

1.  B(v*|p)  *  (K(w  +  oB)  +  pC)/N  (41) 


2. 


B  (o*a|p)  ■  [K(jia  +  oa  +  (L+v)oB)+pCa]/N-K(K-l)(v+oB)a/N(N-l) 
-2pKC  (v  +  oB)/N(N-l)-Cap(p-l)/N(N-i)  . 


(42) 


So,  given  data,  first  compute  i>*  and  a*.  Using  these  values 
unbiased  estimates  vo<  oQ  may  be  found  by  solving 

v*  -  (K(po+o0  Bo)  +  pC)/N 


a *a  -  Ktvo*  +  «o  *  +  (L+Vo^o  Bo]/N  ~K(K-1 )  ( Uo+OoBoX/NOl-l  > 


-  2pKC(v0+o0B0)/N(N-l)+KpCa/M(N-l) 


where 


Bq  ■  B(  (L-jiq)/ o0)  . 

Values  for  y  ,  o  satisfying  these  equations  to  any  pre-set 
o  o 

degree  of  accuracy  may  be  easily  obtained  by  use  of  a  computer. 
as  an  example,  we  Initialize  by 

VI  ■  v*.  «1  ■ 
and  then  update  by 


Bj  -  B((L-pj)/ej) 

VJ+l  -  (Mv*-pC-ojKBj)/K 


oj+1  -  Ho*a/K+<K-l)(iij+1+ajBj)V(N-l)  +  2pC(yj+1+ojB;j)/(M-l ) 
-  pCa/ (M-l)  -  Mj+i  -  (L+vj+i>OjBj  . 


Pill-In  with  Random  Order  statistics 


We  let  X,  <  Y_  <  •••  <  Y  <  L  be  the  (random)  order 
12  p 

statistics.  We  then  use  Proposition  1  and  definitions  (13)  and 
(24)  for  A  and  B  to  obtain  the  following. 


Proposition  4. 

1.  B  (XX  +  EY|p)  -  Np  +  o  (KB  -  pA)  ( 

2.  E(XX2  +  EY2 |p)  -  N(p2  +  a2)  +  <j(L  +  p)(KB  -  pA)  ( 

3.  B((EX+EY)2|p)  -  K(K-l)(p+oB)a+2pK(p+oB)(p-eA)  +  p(p-l)  (p-oA)2 

+  N(p2  +  02)  +  o(L  +  p)(KB  -  pA) .  ( 

Using  these  results,  we  can  compute  the  expected  values  of  our 
estimators  p*  and  0*  as  given  by  (34)  and  (35). 

Corollary  5. 

1.  E(p*|p)  «  v  +  o(KB-pA)/N  (' 

2.  B(o*2|p)  »  [N(pa+02)+0(L+p)(KB-pA)]/N  -  K(K-1 ) ( ji+oB) a/N(N-l ) 

-2pK( p+oB) (p-oA) /N(N-1 )-p(p-l ) ( p-oA) a/N(N-l ) .  C 

We  may  find  unbiased  estimators  p  and  0  (to  any  degree  of 

00 

accuracy)  for  this  case  In  a  fashion  similar  to  that  of  case  1  above. 
One  scheme  puts 

Pj+1  »  p*  -  Oj (KB j-pA j )/N  (5‘ 

*Vi  ’  »*'  *  KifcU  <VW’*  ^  * 

+  p(p~1Kw1-H~q1R1)  -  q1(L'*'v1-H)(KB1~pA1)  -  pa  (5! 

M(M-l)  N  21  ' 

3.  Maximum  Likelihood  Estimators 


We  noted  above  that  the  maximum  likelihood  estimators  (9)  and  (16) 
were  virtually  Identical  to  the  Fill-In  with  expected  value  estimators 
(36)  and  (37).  Consequently,  we  can  find  the  expected  values  for  the 


Lenina  6 . 


F.' 


B(Sa)  -  oa  +  (L-ji)oB  -  oaBa  .  (56) 

Proposition  7. 

1.  B(p*|p)  -  y  +  o(KB-pA)/N  (57) 


2.  H(a*a|p)  -  E(~“0*a  (E.V . )  +  ^Sa|p) 


K-l  a  .  a  .  K-l  .  Wl_  (K-l)3  .  _.a 

—  y  +  o  +  j5ro(L+p)(KB-PA)  -  *75^,  («•♦*)  - 


(58) 


_  2p(K-lJ,  .  .  /„_aA\ a  +  _  oV 

M(M-1)(»‘^  OBAv  °A)  M(M-1)K  (“  °A)  +  K  K 


We  may  find  unbiased  estimators  y  and  a  (to  any  degree 

o  o 

of  accuracy)  for  this  case  in  a  fashion  similar  to  case  1  above, 
one  scheme  puts 


“3+1  "  ’**  "  °J<KBJ_pAJ)/N 


(59) 


«  3+1 


(K-l) 3 

°*a  +N(M-1)  ^3+l+«3B3) 


2p(K-1) 

N(M-1)  (**3+i+a3®3 )  (“3+i_°3A3 ) 


+  P(P-^CK-l) 
H(H-1)K 


(vra3Aj> 


£1 

KN 


0j(L+y3+l)(KB3-pA3) 


+ 


K 


Eli  a 

K  V  3+1  * 


(60) 


,  .VsV:  /  v* v 

A.  mT.ii 


**  -  »  «  • 
m-1  a-".  ,.w 


SECTION  5.  ORDER  STATISTIC  TECHNIQUES 


Previous  work  on  linear  estimators  are  for  Type  II  censoring,  i.e.. 
those  with  fixed  sample  sizes  and  not  fixed  censoring  points.  These  have 
often  been  used  In  Type  I  situations  with  the  hope  that  the  resulting 
bias  Is  small,  tfe  have  Investigated  linear  estimators  for  singly 
censored  Type  I  samples  elsewhere  (Gleit  1983)  and  reported  their  very 
poor  performance. 


S* 


SECTION  6.  LOGNORMAL  DATA 


If  our  sample  were  from  a  lognormal  distribution,  then  we  could  apply 
the  techniques  described  above  to  the  logarithms  of  the  data  to  estimate 
v  and  a  for  the  resulting  normal  distribution.  Then  the  parameters  of 
the  lognormal  could  be  estimated  by  using  the  following  facts: 

mean  of  lognormal  -  exp  (»  +  o3/2) 

variance  of  lognormal  *  exp  (2y  +  a3)  (exp  (a3)  -1). 


SECTION  7.  SIMULATED  DATA 


To  evaluate  the  performance  of  our  estimators  we  performed  a 
simulation.  By  rescaling  and  changing  origins,  all  the  formulas  depend  on 
choosing  two  of  the  three  parameters:  y,  o,  and  L.  Ve  normalized  the 
simulated  data  to  L*1  and  selected  the  following  seven  combinations  for 
v  and  a: 

y  -  0.67:  o  ■  .2.  .3 

li  *  1.00;  o  *  .1.  .2.  .3 

V  *  1.33:  a  x  .2,  .3. 

Ve  selected  M*5.  10,  and  15  as  representative  small  data  set  sizes. 

Using  a  standard  pseudo-random  number  generator  and  the  Box-Mu ller 
transformation,  we  generated  one  million  standard  normal  random  variates. 
Using  these  variates,  we  generated  50.000  data  sets  for  each  of  the 
twenty-one  combinations  of  H.  y.  and  o.  These  data  sets  were  then 
artificially  censored  at  the  cutoff  L*1  and  passed  to  the  several 
estimators  to  “guess"  values  for  y  and  o.  The  data  sets  were  then 
grouped  by  the  value  of  p,  the  number  of  censored  val  es,  p-0,1,  , 

M-l. 

For  each  technique,  each  p,  each  N,  and  each  y,  a  combination  we 
computed  the  mean  and  variance  of  the  estimators  for  y,  for  o,  and  for 
the  mean  and  variance  of  the  lognormal  distribution  whose  logarithms 
follow  the  normal  (y.o)  distribution. 

Typical  results  are  reported  in  Tables  3,  4,  and  5  below.  Table  3 
reports  the  results  for  estimating  the  mean  from  data  with  N*5,  y*1.33. 
and  o«0.2.  The  sample  sizes  are  large  enough  only  for  p*0,l,  and  2.  We 
see  that  the  modified  MLB  and  modified  fill-in  with  constants  routines 
failed  to  converge  for  p«2  and  did  a  fair  Job  of  converging  for  p»l.  The 
truncation  method  had  an  unacceptably  large  variance.  The  MLE  was  very 
biased  high;  the  modified  version  did  not  noticeably  improve  the 


estimator.  The  expected  value  did  a  reasonable  job  while  its  modified 
form  decreased  the  bias  at  the  expense  of  added  variance. 

Table  4  also  reports  the  results  for  estimating  the  mean  but  from 
data  with  N»10,  p-1.00,  and  o*0.3.  The  sample  sizes  are  large  enough 
for  all  values  of  p  from  1  to  9  (i.e..  all  values  of  interest  except  p*0). 
Again  the  modified  MLB  and  modified  fill-in  with  constants  routines  did 
not  converge  very  often.  The  truncation  method  again  has  large  variance, 
the  MLB  is  biased  high,  the  expected  value  does  very  well,  and  the 
modified  expected  value  does  the  best.  Finally  Table  5  reports  the 
results  for  the  mean  for  simulated  lognormal  data  with  mean  1.99,  N*5, 
corresponding  to  p*0.67  and  0.2  for  the  underlying  normal.  The 
results  are  essentially  the  same  as  in  Tables  3  and  4. 

The  results  are  very  consistent  throughout  all  the  twenty-one  cases 
for  each  of  the  four  possible  quantities  estimated:  normal  mean  and 
variance,  lognormal  mean  and  variance.  The  expected  value  estimator  does 
a  very  good  job  while  its  modified  form  reduces  the  bias  but  increases  the 
variance.  These  procedures  converge  just  about  all  the  time.  The  MLB  is 
usually  highly  biased,  has  a  large  variance,  and  is  not  usable  for  the 
case  K-l  (i.e.,  only  one  data  point).  The  modified  MLB  almost  never 
converged;  even  when  it  did.  it  did  a  poor  job.  The  truncation  method 
always  had  an  unacceptably  large  variance;  it  was  also  very  biased  for  P/N 
large  and  not  usable  for  K»l.  Fill-in  constants  did  not  perform  very 
well.  For  small  p,  fill-in  with  0.5  did  not  do  too  badly:  for  large  p, 
the  estimator  virtually  agrees  with  the  constant  and  so  is  of  no  value. 

The  modified  form  almost  never  converges.  Using  the  criteria  of  minimum 

square  error,  i.e. 

£ 

square  error  ■  B  (I  -  6)a 

-  Bias3  +  Variance, 

in  general  the  modified  expected  value  is  best  with  fill-in  by  expected 
values  coming  in  a  close  second. 


CONCLUSION 

Ve  have  presented  above  several  methods  to  estimate  the  mean  and 
variance  for  a  normal  distribution  based  on  censored  from  below  data 
sets.  Several  are  extremely  simple,  most  require  extensive  computer 
calculations  and  some  require  extensive  tables.  Among  these  the  modified 
fill-in  by  expected  values  (54)  and  (55)  is  our  choice  with  fill-in  by 
expected  values  (36)  and  (37)  a  close  second.  Though  far  more  biased, 
this  latter  approach  has  lower  variance. 
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TABLE  4.  ESTIMATES  FOR  MEAN.  M=10,  NEAIM.OO,  STD.  DEVIATI0N=0.3 
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TABLE  4.  ESTIMATES  FOR  MEAN.  N=10,  HEAN=1.00,  STO.  0EVIATIQN=0.3  (CONTINUED) 
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